Day 17 - Regular expressions - Groups
Solutions to exercises
Exercise 17.01
Extract all the lines of simple.log that contain an HTTP method GET or POST, rewrite each line in
the form <time> <HTTP status> <HTTP method>. The result for the first 10 lines should be
10:05:03 200 GET
10:05:43 200 GET
10:05:47 200 GET
10:05:12 200 GET
10:05:07 200 GET
10:05:34 200 GET
10:05:57 200 GET
10:05:50 200 GET
10:05:24 200 GET
10:05:50 200 GET
Solution
$ head simple.log | grep -E " (GET|POST) " | sed -r s,".*[0-9]{4}:(.*)] (GET|POST).*\
HTTP/1.[01] ([0-9]{3}).*","\1 \3 \2",
10:05:03 200 GET
10:05:43 200 GET
10:05:47 200 GET
10:05:12 200 GET
10:05:07 200 GET
10:05:34 200 GET
10:05:57 200 GET
10:05:50 200 GET
10:05:24 200 GET
10:05:50 200 GET
The idea behind this solution it to find all the lines that contain GET or POST using the logical OR
in a group, so that either can be surrounded by spaces, which helps avoiding other mentions of
those letters (like for example a line with an URL that contains “BUDGET” or “POSTER”). Then